independence criterion
Learning Treatment Representations for Downstream Instrumental Variable Regression
Lin, Shiangyi, Lan, Hui, Syrgkanis, Vasilis
Traditional instrumental variable (IV) estimators face a fundamental constraint: they can only accommodate as many endogenous treatment variables as available instruments. This limitation becomes particularly challenging in settings where the treatment is presented in a high-dimensional and unstructured manner (e.g. descriptions of patient treatment pathways in a hospital). In such settings, researchers typically resort to applying unsupervised dimension reduction techniques to learn a low-dimensional treatment representation prior to implementing IV regression analysis. We show that such methods can suffer from substantial omitted variable bias due to implicit regularization in the representation learning step. We propose a novel approach to construct treatment representations by explicitly incorporating instrumental variables during the representation learning process. Our approach provides a framework for handling high-dimensional endogenous variables with limited instruments. We demonstrate both theoretically and empirically that fitting IV models on these instrument-informed representations ensures identification of directions that optimize outcome prediction. Our experiments show that our proposed methodology improves upon the conventional two-stage approaches that perform dimension reduction without incorporating instrument information.
Conditional Dependence via U-Statistics Pruning
de Cabrera, Ferran, Vilร -Insa, Marc, Riba, Jaume
The problem of measuring conditional dependence between two random phenomena arises when a third one (a confounder) has a potential influence on the amount of information shared by the original pair. A typical issue in this challenging problem is the inversion of ill-conditioned autocorrelation matrices. This paper presents a novel measure of conditional dependence based on the use of incomplete unbiased statistics of degree two, which allows to re-interpret independence as uncorrelatedness on a finite-dimensional feature space. This formulation enables to prune data according to the observations of the confounder itself, thus avoiding matrix inversions altogether. Moreover, the proposed approach is articulated as an extension of the Hilbert-Schmidt independence criterion, which becomes expressible through kernels that operate on 4-tuples of data.
Towards Independence Criterion in Machine Unlearning of Features and Labels
Han, Ling, Luo, Nanqing, Huang, Hao, Chen, Jing, Hartley, Mary-Anne
This work delves into the complexities of machine unlearning in the face of distributional shifts, particularly focusing on the challenges posed by non-uniform feature and label removal. With the advent of regulations like the GDPR emphasizing data privacy and the right to be forgotten, machine learning models face the daunting task of unlearning sensitive information without compromising their integrity or performance. Our research introduces a novel approach that leverages influence functions and principles of distributional independence to address these challenges. By proposing a comprehensive framework for machine unlearning, we aim to ensure privacy protection while maintaining model performance and adaptability across varying distributions. Our method not only facilitates efficient data removal but also dynamically adjusts the model to preserve its generalization capabilities. Through extensive experimentation, we demonstrate the efficacy of our approach in scenarios characterized by significant distributional shifts, making substantial contributions to the field of machine unlearning. This research paves the way for developing more resilient and adaptable unlearning techniques, ensuring models remain robust and accurate in the dynamic landscape of data privacy and machine learning.
A statistical approach to detect sensitive features in a group fairness setting
Pelegrina, Guilherme Dean, Couceiro, Miguel, Duarte, Leonardo Tomazeli
The use of machine learning models in decision support systems with high societal impact raised concerns about unfair (disparate) results for different groups of people. When evaluating such unfair decisions, one generally relies on predefined groups that are determined by a set of features that are considered sensitive. However, such an approach is subjective and does not guarantee that these features are the only ones to be considered as sensitive nor that they entail unfair (disparate) outcomes. In this paper, we propose a preprocessing step to address the task of automatically recognizing sensitive features that does not require a trained model to verify unfair results. Our proposal is based on the Hilber-Schmidt independence criterion, which measures the statistical dependence of variable distributions. We hypothesize that if the dependence between the label vector and a candidate is high for a sensitive feature, then the information provided by this feature will entail disparate performance measures between groups. Our empirical results attest our hypothesis and show that several features considered as sensitive in the literature do not necessarily entail disparate (unfair) results.
Entropy Regularized Optimal Transport Independence Criterion
Liu, Lang, Pal, Soumik, Harchaoui, Zaid
Statistical independence measures have been widely used in machine learning and statistics, ranging from independence component analysis (Bach and Jordan, 2002; Gretton et al., 2005) to causal inference (Pfister et al., 2018; Chakraborty and Zhang, 2019), and recently in self-supervised learning (Li et al., 2021) and representation learning (Ozair et al., 2019). Classical dependence measures such as Pearson's correlation coefficient, Spearman's ฯ, and Kendall's ฯ (Hoeffding, 1948; Kruskal, 1958; Lehmann, 1966) focus on real-valued one dimensional random variables and thus are not suitable for high dimensional data; see also (Schweizer and Wolff, 1981; Nikitin, 1995). One popular choice of independence measures in high dimension is the Hilbert-Schmidt independence criterion (HSIC) (Gretton et al., 2005). This criterion was used to develop an independence test by Gretton et al. (2007b). Several extensions of HSIC are available, such as a relative dependency measure (Bounliphone et al., 2015) and a joint independence measure among multiple random elements (Pfister et al., 2018). Another choice is the distance covariance (dCov) of Szรฉkely et al. (2007).
Towards Learning an Unbiased Classifier from Biased Data via Conditional Adversarial Debiasing
Reimers, Christian, Bodesheim, Paul, Runge, Jakob, Denzler, Joachim
Bias in classifiers is a severe issue of modern deep learning methods, especially for their application in safety- and security-critical areas. Often, the bias of a classifier is a direct consequence of a bias in the training dataset, frequently caused by the co-occurrence of relevant features and irrelevant ones. To mitigate this issue, we require learning algorithms that prevent the propagation of bias from the dataset into the classifier. We present a novel adversarial debiasing method, which addresses a feature that is spuriously connected to the labels of training images but statistically independent of the labels for test images. Thus, the automatic identification of relevant features during training is perturbed by irrelevant features. This is the case in a wide range of bias-related problems for many computer vision tasks, such as automatic skin cancer detection or driver assistance. We argue by a mathematical proof that our approach is superior to existing techniques for the abovementioned bias. Our experiments show that our approach performs better than state-of-the-art techniques on a well-known benchmark dataset with real-world images of cats and dogs.
Sobolev Independence Criterion
Mroueh, Youssef, Sercu, Tom, Rigotti, Mattia, Padhi, Inkit, Santos, Cicero Dos
We propose the Sobolev Independence Criterion (SIC), an interpretable dependency measure between a high dimensional random variable X and a response variable Y . SIC decomposes to the sum of feature importance scores and hence can be used for nonlinear feature selection. SIC can be seen as a gradient regularized Integral Probability Metric (IPM) between the joint distribution of the two random variables and the product of their marginals. We use sparsity inducing gradient penalties to promote input sparsity of the critic of the IPM. In the kernel version we show that SIC can be cast as a convex optimization problem by introducing auxiliary variables that play an important role in feature selection as they are normalized feature importance scores. We then present a neural version of SIC where the critic is parameterized as a homogeneous neural network, improving its representation power as well as its interpretability. We conduct experiments validating SIC for feature selection in synthetic and real-world experiments. We show that SIC enables reliable and interpretable discoveries, when used in conjunction with the holdout randomization test and knockoffs to control the False Discovery Rate. Code is available at http://github.com/ibm/sic.